201 research outputs found

    Cross-Entropy Clustering

    Full text link
    We construct a cross-entropy clustering (CEC) theory which finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, our theory gives simple and efficient criterion to verify cluster validity. Although CEC can be build on an arbitrary family of densities, in the most important case of Gaussian CEC: {\em -- the division into clusters is affine invariant; -- the clustering will have the tendency to divide the data into ellipsoid-type shapes; -- the approach is computationally efficient as we can apply Hartigan approach.} We study also with particular attention clustering based on the Spherical Gaussian densities and that of Gaussian densities with covariance s \I. In the letter case we show that with ss converging to zero we obtain the classical k-means clustering

    Extreme Entropy Machines: Robust information theoretic classification

    Full text link
    Most of the existing classification methods are aimed at minimization of empirical risk (through some simple point-based error measured with loss function) with added regularization. We propose to approach this problem in a more information theoretic way by investigating applicability of entropy measures as a classification model objective function. We focus on quadratic Renyi's entropy and connected Cauchy-Schwarz Divergence which leads to the construction of Extreme Entropy Machines (EEM). The main contribution of this paper is proposing a model based on the information theoretic concepts which on the one hand shows new, entropic perspective on known linear classifiers and on the other leads to a construction of very robust method competetitive with the state of the art non-information theoretic ones (including Support Vector Machines and Extreme Learning Machines). Evaluation on numerous problems spanning from small, simple ones from UCI repository to the large (hundreads of thousands of samples) extremely unbalanced (up to 100:1 classes' ratios) datasets shows wide applicability of the EEM in real life problems and that it scales well

    Paraconvex, but not strongly, Takagi functions

    Get PDF
    There is an important open problem in the theory of approximate convexity whether every paraconvex function on a bounded interval is strongly paraconvex. Our aim is to show that this is not the case. To do this we need the following generalization of Takagi function. For a sequence a = (ai)i∈N ⊂ R+ we consider Takagi-like function of the form T(a)(x) := ∑ ∞ i=1 aidist(x, 12i-1Z) for x ∈ R. We give convenient conditions for verification whether T(a) is paraconvex or strongly paraconvex. This enables us to construct a class of paraconvex functions which are not strongly paraconvex

    LOSSGRAD: automatic learning rate in gradient descent

    Get PDF
    In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a function ff, a point xx, and the gradient xf\nabla_x f of ff, we aim to find the step-size hh which is (locally) optimal, i.e. satisfies: h=argmint0f(xtxf). h=arg\,min_{t \geq 0} f(x-t \nabla_x f). Making use of quadratic approximation, we show that the algorithm satisfies the above assumption. We experimentally show that our method is insensitive to the choice of initial learning rate while achieving results comparable to other methods.Comment: TFML 201
    corecore